A Study of Likelihood Ratio Calibration in High Vocal Effort Speech for a Modern Automatic Speaker Recognition System
نویسندگان
چکیده
The production of speech is not only influenced by various intrinsic factors such as semantics, dialect, human perspective and emotion, but also by extrinsic factors such as environmental conditions and transmission channel. In certain acoustic conditions, the vocal effort of a speaker tends to be raised in order to overcome environmental hindrances such as a presence of noise or a long distance between the speaker and listener. There have only been a few studies on speaker recognition under non-neutral speech production conditions (i.e., high or low vocal effort and speech under stress) (Hansen, 2011). However, in real forensic cases, it can occur that the incriminating recording is made with high vocal effort, which then has to be dealt with in speaker comparison.
منابع مشابه
Investigating the COG ratio as feature for speaker verification on high-effort speech
Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet. Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to ...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملEffective Segmentation based on Vocal Effort Change Point Detection
Non-neutral speech data has a strong negative impact on speech processing systems such as Automatic Speech Recognition (ASR) or speaker ID systems [1]. It is therefore necessary to detect and segment non-neutral speech data before further processing steps. Alternatively, the detection and segmentation of non-neutral speech segments from an input speech stream can be used in speech analysis and ...
متن کاملSpeaker Line-up Calibration of the i-vector Based Speaker Recognition System for Forensic Application
An automatic speaker recognition (ASR) system must produce reliable likelihood ratios (LR) in order to be used for evaluating and presenting speech evidence to court. The LR is only reliable if it produced from a well-calibrated ASR. A study by Rodriguez (2007) showed that the LR calculated from the un-calibrated system was often misleading, while the calibrated system produced more reliable LR...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کامل